Fast Convergence of Markov Chain Monte Carlo Algorithms for Phylogenetic Reconstruction with Homogeneous Data on Closely Related Species

نویسندگان

  • Daniel Stefankovic
  • Eric Vigoda
چکیده

This paper studies a Markov chain for phylogenetic reconstruction which uses a popular transition between tree topologies known as subtree pruning-and-regrafting (SPR). We analyze the Markov chain in the simpler setting where the generating tree consists of very short edge lengths, short enough so that each sample from the generating tree (or character in phylogenetic terminology) is likely to have only one mutation, and where there are enough samples so that the data looks like the generating distribution. We prove in this setting that the Markov chain is rapidly mixing, i.e., it quickly converges to its stationary distribution, which is the posterior distribution over tree topologies. Our proofs use that the leading term of the maximum likelihood function of a tree T is the maximum parsimony score, which is the size of the minimum cut in T needed to realize single edge cuts of the generating tree. Our main contribution is a combinatorial proof that, in our simplified setting, SPRmoves are guaranteed to converge quickly to the maximum parsimony tree. Our results are in contrast to recent works showing examples with heterogeneous data (namely, the data is generated from a mixture distribution) where many natural Markov chains are exponentially slow to converge to the stationary distribution.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast Convergence of MCMC Algorithms for Phylogenetic Reconstruction with Homogeneous Data on Closely Related Species

This paper studies a Markov chain for phylogenetic reconstruction which uses a popular transition between tree topologies known as subtree pruning-and-regrafting (SPR). We analyze the Markov chain in the simpler setting that the generating tree consists of very short edge lengths, short enough so that each sample from the generating tree (or character in phylogenetic terminology) is likely to h...

متن کامل

Spatial count models on the number of unhealthy days in Tehran

Spatial count data is usually found in most sciences such as environmental science, meteorology, geology and medicine. Spatial generalized linear models based on poisson (poisson-lognormal spatial model) and binomial (binomial-logitnormal spatial model) distributions are often used to analyze discrete count data in which spatial correlation is observed. The likelihood function of these models i...

متن کامل

Limitations of Markov chain Monte Carlo algorithms for Bayesian Inference of phylogeny

Markov Chain Monte Carlo algorithms play a key role in the Bayesian approach to phylogenetic inference. In this paper, we present the first theoretical work analyzing the rate of convergence of several Markov Chains widely used in phylogenetic inference. We analyze simple, realistic examples where these Markov chains fail to converge quickly. In particular, the studied data is generated from a ...

متن کامل

Phylogeny of Mixture Models: Robustness of Maximum Likelihood and Non-Identifiable Distributions

We address phylogenetic reconstruction when the data is generated from a mixture distribution. Such topics have gained considerable attention in the biological community with the clear evidence of heterogeneity of mutation rates. In our work we consider data coming from a mixture of trees which share a common topology, but differ in their edge weights (i.e., branch lengths). We first show the p...

متن کامل

Assessing the Convergence of Markov Chain Monte Carlo Methods for Bayesian Inference of Phylogenetic Trees

Assessing the Convergence of Markov Chain Monte Carlo Methods for Bayesian Inference of Phylogenetic Trees In biology, it is commonly of interest to investigate the ancestral pattern that gave rise to a currently existing group of individuals, such as genes or species. This ancestral pattern is frequently represented pictorially by a phylogenetic tree. Due to the growing popularity of Bayesian ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • SIAM J. Discrete Math.

دوره 25  شماره 

صفحات  -

تاریخ انتشار 2011